<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta NAME="description" CONTENT="Extended SMILES and SMARTS in Marvin">
<meta NAME="keywords" CONTENT="Extended SMILES, SMARTS, Java, Marvin">
<meta NAME="author" CONTENT="Andras Volford">
<link REL ="stylesheet" TYPE="text/css" HREF="../marvinmanuals.css" TITLE="Style">
<title>Extended SMILES in Marvin</title>
</head>
<body>

<h1>Extended SMILES, SMARTS</h1>

<p>
Codename: <strong>cxsmiles</strong>,<strong>cxsmarts</strong>
</p>
<h2>Contents:</h2>
<ul>
<li><a href="#cxsmiles">Extended SMILES, SMARTS format</a></li>
<li><a href="#ioptions">Import options</a></li>
<li><a href="#options">Export options</a></li>
</ul>
<p>
<h2><a class="anchor" name="cxsmiles">Extended SMILES, SMARTS format</a></h2>
ChemAxon Extended SMILES/SMARTS is used for storing special features 
of the molecules after the <a HREF="smiles-doc.html">SMILES</a> string.
Any information can be stored after the SMILES string 
if it is separated by space or tab characters as the SMILES parsers ignore them 
or use them as comment. 
The extended features are stored in the following format:<br>
<code>SMILES_String |&lt;feature1&gt;,&lt;feature2&gt;,...|</code><br>
The extended feature description is economic.
If some feature is missing in the molecule, then the corresponding special 
characters are not written. 
(Eg: If the atoms of the molecule has no alias strings at all, 
no &quot;$&quot; and &quot;;&quot; characters are written.) 
Moreover, if no feature of the molecule to be written, 
the extended feature field is omitted.<br>

Please note that the SMILES string part generated in cxsmiles format is not 
always the same as the one generated by smiles output. Eg: In case of Ferrocene 
the coordinate bonds are not exported to plain SMILES ([Fe].c1cccc1.c1cccc1), 
but they appear in the cxsmiles 
(c12c3c4c5c1[Fe]23451234c5c1c2c3c45 |C:4.5,0.6,1.7,2.8,3.9,7.12,6.10,9.16,10.18,8.14|).

<p>
In extended smiles export the following additional features are exported:
<ul>

<li>All aromatic atom are exported with lowercase letter 
in the SMILES string part.<br>
E.g. aromatic Boron is written with lowercase letter: 
b1ccccc1.
</li>

<li>Molecule absolute stereoconfiguration (For detailed
description see the <a href="http://www.chemaxon.com/jchem/doc/user/query_stereochemistry.html">
Stereochemistry</a> section of the Query guide in JChem Base.)
<p>
The relative stereoconfiguration is stored as &quot;<b>r</b>&quot;. 
The absolute stereoconfiguration is the default, which is not marked.
(Absolute stereoconfiguration known also as "Chiral flag" in MDL molfiles. )
</p>
</li>


<li>Enhanced stereochemical representation (For detailed
description see the <a href="http://www.chemaxon.com/jchem/doc/user/query_stereochemistry.html">
Stereochemistry</a> section of the Query guide in JChem Base.)
<p>
The following stereochemical group types are stored:
</p>
    <ul>
	<li>Absolute stereo group type. <br>
	    <b>a:</b>&lt;atomindex&gt;,&lt;atomindex&gt;...
	</li>
	<li>OR stereo group type.<br>
	    <b>o</b>&lt;group&gt;<b>:</b>&lt;atomindex&gt;,&lt;
	    atomindex&gt;...</li>
	<li>AND stereo group type.<br>
	    <b>&</b>&lt;group&gt;<b>:</b>&lt;atomindex&gt;,&lt;
	    atomindex&gt;...</li>
    </ul>
</li>


<li>Atom labels / aliases / values
<p>
Atom labels / aliases are written between &quot;<b>$</b>&quot; characters 
each label is separated by &quot;;&quot; characters.<br>
Atom values are written after &quot;<b>$_AV:</b>&quot; separated by 
semicolon characters and closed with &quot;<b>$</b>&quot; tag.
</p>
</li>

<li>Single &quot;Up or Down&quot; (Wiggly), UP and DOWN bonds
<p>
Atom indexes relating to wiggly bonds are written after &quot;<b>w:</b>&quot; 
followed by a dot character and the wiggly bond index.
The wiggly bonds are separated by commas.<br>
If atomic coordinates are also exported, then UP bonds are written
after &quot;<b>wU:</b>&quot;
DOWN bonds are written after &quot;<b>wD:</b>&quot; in a similar way to
wiggly bond export.
</p>
</li>

<li>CIS, TRANS, UNSPEC bond info for double bonds in rings
<p>
Bond indexes of the double bonds in SSSR are written. <br>
The bond stereo information is generated as the following: 
the double bond has the representation a1-a2=a3-a4, where <br>
</p>
<UL>
<LI> a1 is the smallest atom index of the generated smiles connected to a2
<LI> a2 is the double bond smaller atom index in the generated smiles
<LI> a3 is the double bond larger atom index in the generated smiles
<LI> a4 is the smallest atom index of the generated smiles connected to a3
</UL>
The CIS double bond indexes are written after &quot;<b>c:</b>&quot;, <br>
the TRANS double bond indexes are written after &quot;<b>t:</b>&quot;, <br>
the double bond indexes with UNSPEC flag are written after &quot;<b>u:</b>&quot;.
</li>

<li>Fragment level grouping of reactant, agent and product fragments
    Grouped fragment indexes are written after &quot;f:&quot; in the following
    format:<br>
    <UL>
    <LI> Connected groups are separated by &quot;,&quot;.
    <LI> A connected group is a &quot;.&quot; separated list of fragment indices.
    </UL><br>
    Example: &quot;f:0.1,5.6&quot;
</li>

<li>Local parity information
<p>
Atom indexes with local ODD parity are written after &quot;<b>@:</b>&quot;,
while atom indexes with local EVEN parity are written 
after &quot;<b>@@:</b>&quot;
characters separated by commas.
</p>
</li>

<li>Radical numbers
<p>
Atom indexes with 
<ul>
<li> monovalent radical center are written after &quot;<b>^1:</b>&quot;,
<li> divalent radical center are written after &quot;<b>^2:</b>&quot;,
<li> divalent singlet radical center are written after &quot;<b>^3:</b>&quot;,
<li> divalent triplet radical center are written after &quot;<b>^4:</b>&quot;,
<li> trivalent radical center are written after &quot;<b>^5:</b>&quot;,
</ul>
characters separated by commas.
</li>

<li>Lone electron pairs
<p>
The indexes of the atoms having bond connected lone electron pairs are 
written after 
&quot;<b>LP:</b>&quot;.
</p>
<p>
The indexes of the atoms followed by a colon character and the number of
explicit lone electron pairs are written after 
&quot;<b>lp:</b>&quot;.<br>
(See <a href=" http://www.chemaxon.com/cengage/marvin/examples/applets/sketch/studentexam/index.html">live example</a>.)
</p>
<p>
Example: &quot;LP:1,lp:0:1,2:2&quot;
</p>
</li>

<li>Multicenter SGroups and coordinate bonds
<p>
The multicenter atom indexes written after &quot;<b>m:</b>&quot; 
followed by a colon character and the indexes of the atoms which forms 
the given SGroup separated by &quot;<b>.</b>&quot;. 
The SGroups are separated by commas.
</p>
<p>
Atom indexes relating to coordinate bond indexes are written after 
&quot;<b>C:</b>&quot; followed by a dot character 
and the coordinate bond index.
The coordinate bonds are separated by commas.<br>
In the smiles part of cxsmiles the atom-to-atom coordinate bonds are 
represented by single bonds, which are corrected according to the 
C information at the extended part.
</p>
Example: &quot;m:0:7.6.5.4.3,2:12.11.10.9.8,C:0.0,2.1&quot;
</li>

<li>Link nodes
<p>
The link node atom indexes are written after &quot;<b>LN:</b>&quot;
followed by a colon character, the minimum repetitions, maximum repetitions,
the node first and second outer atom indexes separated by &quot;<b>.</b>&quot;.
If the link node has only two connections, then the first and second outer atom 
indexes are obvious, so they are omitted.
The link nodes separated by commas.
</p>
Example: &quot;LN:1:1.5.3.0,6:1.2.7.5,9:1.10.10.8&quot;
</li>

<li>Atomic coordinates
<p>
The atomic coordinates are written between parentheses. 
Each atomic coordinate triplet (x, y, z) is separated by semicolon, and the
x y z coordinates are separated by commas. Zero coordinates are omitted.<br>
Note: The CIS/TRANS information is redundant in this case. It is specified 
in the SMILES string and also in the atomic coordinates. The 
atomic coordinates has priority over the SMILES string.
</p>
</li>
<li>
    Data Sgroup information
    <p>
        Atomic indexes in the data sgroup are written after
        &quot;<b>SgD:</b>&quot; followed by 
        field name, data value, query operator, unit, tag
        and coordinates in parenthesis if necessary, separated by
        colon characters. If atomic coordinates are exported (with
        <b>c</b> option) (-1) is used in the coordinate field
        for data sgroup attached to the atoms.<br>
            Example: &quot;SgD:3,2,1,0:name:data:like:unit:t:(-1)&quot;
    </p>
</li>
<li>
    Attachment point information
    <p>
    Atomic indexes of the attachment points written after
    &quot;<b>AP_x:</b>&quot; where x denotes the attachment point type
    (1 or 2), separated by commas.<br>
    Example: &quot;AP_1:10,AP_2:3 &quot;
    </p>
</li>
</ul>
<p>

<h3><a class="anchor" NAME="ioptions">Import options</a></h3>
<p>
    See <a HREF="smiles-doc.html#ioptions">SMILES import options</a>.

<p>
<h3><a class="anchor" NAME="options">Export options</a></h3>

<p>
Export options can be specified in the format string. The format descriptor
and the options are separated by a colon. 
All options have default values (see below).
Using the &quot;+&quot; or &quot;-&quot; sign the default export values 
can be changed to &quot;true&quot; or &quot;false&quot; respectively. If the option is given without &quot;+&quot; or &quot;-&quot; modifier then the 
default values are not used and only the specific feature is exported.
<br>
Examples: <br>
&quot;cxsmiles:&quot; writes all default features 
(absolute stereoconfiguration, enhanced
stereo features, atom labels, wiggly bond indexes, ring stereo bond info and
reaction fragment level grouping),<br>
&quot;cxsmiles:lc&quot; writes the atom labels and the atomic coordinates only,<br>
&quot;cxsmiles:+c&quot; writes writes all default features and the atomic coordinates,<br>
&quot;cxsmiles:-le&quot; writes absolute stereoconfiguration, enhanced
stereo features, ring stereo bond info and
reaction fragment level grouping but not atom labels and 
wiggly bond indexes.

<blockquote>
<table CELLSPACING=0 CELLPADDING=0 border="0">
<tr VALIGN="TOP">
    <td><a class="text" NAME="option_e"><strong>e</strong></a>&nbsp;&nbsp;&nbsp;&nbsp;</td>
    <td>Write relative stereo configuration and enhanced stereo features. Default value: <i>true</i>.
    </td></tr>
<tr VALIGN="TOP">
    <td><a class="text" NAME="option_l"><strong>l</strong></a></td>
    <td>Write atom labels / aliases / values. Default value: <i>true</i>.</td></tr>
<tr VALIGN="TOP">
    <td><a class="text" NAME="option_w"><strong>w</strong></a></td>
    <td>Write wiggly and in case of atomic coordinate export also
    UP and DOWN bond indexes. Default value: <i>true</i>.  </td></tr>
<tr VALIGN="TOP">
    <td><a class="text" NAME="option_d"><strong>d</strong></a></td>
    <td>Write CIS, TRANS ring bond indexes. Default value: <i>true</i>.</td></tr>
<tr VALIGN="TOP">
    <td><a class="text" NAME="option_f"><strong>f</strong></a></td>
    <td>Reaction fragment level grouping. Default value: <i>true</i>.</td></tr>
<tr VALIGN="TOP">
    <td><a class="text" NAME="option_p"><strong>p</strong></a></td>
    <td>Write local parities. Default value: <i>true</i>.</td></tr>
<tr VALIGN="TOP">
    <td><a class="text" NAME="option_R"><strong>R</strong></a></td>
    <td>Write radical numbers. Default value: <i>true</i>.</td></tr>
<tr VALIGN="TOP">
    <td><a class="text" NAME="option_LL"><strong>L</strong></a></td>
    <td>Write lone electron pairs. Default value: <i>true</i>.</td></tr>
<tr VALIGN="TOP">
    <td><a class="text" NAME="option_m"><strong>m</strong></a></td>
    <td>Write multicenter SGroups and coordinate bonds. Default value: <i>true</i>.</td></tr>
<tr VALIGN="TOP">
    <td><a class="text" NAME="option_N"><strong>N</strong></a></td>
    <td>Write link nodes. Default value: <i>true</i>.</td></tr>
<tr VALIGN="TOP">
    <td><a class="text" NAME="option_c"><strong>c</strong></a>[p]</td>
    <td>Write atomic coordinates. 
    <i>p</i> can optionally specify the coordinate precision.
    If <i>p</i> is not specified, the default value 2 is used.  
    Default value: <i>false</i>.</td></tr>
<tr VALIGN="TOP">
    <td><a class="text" NAME="option_D"><strong>D</strong></a></td>
    <td>Write Data Sgroup information.
    Default value: <i>true</i>.</td></tr>
</table>
</blockquote>
<p>
    See also <a HREF="smiles-doc.html#options">SMILES export options</a>
    and <a HREF="basic-export-opts.html">basic export options</a>.

<p>
<h2>See also</h2>
<ul>
<li><a HREF="smiles-doc.html">SMILES and SMARTS</a></li>
<li><a HREF="../../examples/applets/sketch/studentexam/index.html">Explicit lone pair live example</a></li>
</ul>

</body>
</html>
